Note 6 Functional Dependency Theory

Author: Zhen Tong 120090694@link.cuhk.edu.cn
Lecturer: Jan YOU

To better understand this chapter, you are encourage to play the functional dependency calculator with lab2 on my githubヾ(≧▽≦*)o

Closure of a FDs set

In the last note we learned the concept of closure of a set of functional dependencies. Now let’s see how to derive a $F^+$

We can compute $F^+$ by repeatedly applying Armstrong’s Axioms.

Reflexive rule: if $\beta\sub\alpha$ , then $\alpha\rightarrow\beta$

Augmentation rule: if $\alpha\rightarrow\beta$ , then $\gamma\alpha\rightarrow\gamma\beta$

Transitivity rule: if $\alpha\rightarrow\beta, and \beta\rightarrow\gamma, then \ \alpha\rightarrow\gamma$

According to the Armstrong’s Axioms there are 3 additional rules inferred.

Union rule: If $\alpha\rightarrow\beta$ holds and $\alpha\rightarrow\gamma$ holds, then $\alpha\rightarrow\beta\gamma$

Decomposition rule: If $\alpha\rightarrow\beta\gamma$ , then $\alpha\rightarrow\beta$ holds and $\alpha\rightarrow\gamma$ holds

Pseudotransitivity rule:If $\alpha\rightarrow\beta$ holds and $\gamma\beta\rightarrow\delta$ , then $\gamma\alpha\rightarrow\delta$

The following computes the closure of a set of functional dependencies F

Closure of Attribute Sets

Given a set of attributes $\alpha$ , define the closure of $\alpha$ under F (denoted by $\alpha+$ ) as the set of attributes that are functionally determined by $\alpha$ under F

The pseudo-code is like:

def sub_closure(self, sub_attrs:set):
    '''
    Given a set of attributes alpha, define the closure of alpha+ under F 
    '''
    result = set()
    result_prime = set(a for a in sub_attrs)
    while len(result) != len(result_prime):
        # copy the prime to origin
        result = set(a for a in result_prime)
        # print("result = ", result)
        for fd in self.FDs:
            if fd.alpha.issubset( result) :
                result_prime = result_prime.union(fd.beta)
                # print("Unioned:", result_prime)
        print("result", result)
    return result

Example of Attribute Set Closure

$R = (A, B, C, G, H, I)$

$F = \{A \rightarrow B, A \rightarrow C , CG \rightarrow H, CG \rightarrow I, B \rightarrow H\}$

We can find the Closure of Attribute Sets with the function:

There are several uses of the attribute closure algorithm:

Testing for superkey:
- To test if $\alpha$ is a superkey, we compute $\alpha+$ , and check if $\alpha+$ contains all attributes of R

Testing functional dependencies
- To check if a functional dependency $\alpha\rightarrow\beta$ holds (or, in other words,
  is in F+), just check if
  $\beta\sub\alpha+$
- That is, we compute $\alpha$ + by using attribute closure, and then check if it contains $\beta$

Or simply we just want to compute the sub-closure among the sub-attributes.

Canonical Cover

A canonical cover is a minimal and simplified set of functional dependencies that has the same closure as the original set of functional dependencies on a relation schema. It is used to make it more efficient to ensure that functional dependencies are not violated when updates are performed on the database.

In one word, canonical cover is summary the essence of a set of FD

Ensuring FD Satisfaction: When updates (inserts, updates, deletes) are performed on the database, the database system needs to ensure that these updates do not violate any of the functional dependencies in the set F. If an update would cause a violation, it must be rolled back to maintain data consistency.

Canonical Cover for Efficiency: To make the process of checking for violations more efficient, you can create a simplified set of functional dependencies known as the "canonical cover." The canonical cover contains only the essential functional dependencies required to represent the same closure (the set of all FDs derivable from F).

Extraneous Attributes: To construct the canonical cover, you must identify and eliminate extraneous attributes. An attribute in a functional dependency in F is extraneous if you can remove it without changing the closure of F. In other words, it doesn't contribute to the uniqueness of the attribute values.

Test if an Attribute is Extraneous

Let R be a relation schema and let F be a set of functional dependencies that hold on R . Consider an attribute in the functional dependency $\alpha\rightarrow\beta$

To test if attribute $A\in\beta$ is extraneous in $\beta$

Consider the set that removing $A$ from the FD

$F' = (F-\{\alpha\rightarrow\beta\}\cup\{\alpha\rightarrow(\beta-A)\}$

Check that $\alpha+$ contains A under $F'$
- If does, A is extrainesou

To test if attribute $A\in\alpha$ is extraneous in $\alpha$

Let $\gamma = \alpha-A$ . Check if $\gamma\rightarrow\beta$ can be inferred from F
- Compute $\gamma+$ using the dependencies in $F$
- If $\gamma +$ inclurdes all attributes in $\beta$ , then A is extraneous in $\alpha$

In python the code is:

def is_extranious(self, fd:FunctionalDependency, A:set, determinant:bool):
        alpha = copy.deepcopy(fd.alpha)
        beta = copy.deepcopy(fd.beta)
        if determinant:
            gamma = alpha.difference(A)
            gamma_plus = self.sub_closure(gamma)
            if beta.issubset(gamma_plus):
                return True
            else:
                return False
        else:   
            F= set(i for i in self.FDs)
            F_prime = (F.difference({fd})).union({FunctionalDependency(alpha, beta.difference(A))})
            alpha_plus = self.sub_closure(alpha, F_prime)
            if A.issubset(alpha_plus):
                return True
            else:
                return False

Now we can compute the Canonical Cover

A canonical cover for F is a set of dependencies Fc such that:

$F$ logically implies all dependencies in $F_c$

$F_c$ logically implies all dependencies in $F$

No functional dependency in $F_c$ contains an extraneous attribute

Each left side of functional dependency in $F_c$ is unique.
- There are no two dependencies in $F_c$ :
  
  $\alpha_1\rightarrow\beta_1$ and $\alpha_2\rightarrow\beta_2$ such that $\alpha_1 == \alpha_2$

To compute a canonical cover for F

The first thing during the loop is satisfying the fourth rule in Canonical Cover. And whenever there is a new functional dependency established based on that, the old two functional dependencies are useless.

The second thing during the loop is satisfying the third rule in Canonical Cover.

Dependency Preservation

If we decompose the relation schema $R$ into $[R_1, ..., R_n]$ , the restriction of the functional dependency set $F$ to $R_i$ is the set $F_i$ of all functional dependencies in F+that include only attributes of R

Note that the definition of restriction uses all dependencies in F+, not just those in F

A decomposition is dependency-preserving, if $(F_1\cup F_2\cup ...\cup F_n)^+ = F^+$

BCNF Decomposition Algorithm

In the algorithm above, we need to first check whether the decomposed relational schema $R_i$ in the results is BCNF. We can do any one of the two things:

Naively, you check the BCNF condition in the new closure (restriction). Given a decomposed relational schema $R_i$ , derive the restriction with respect to the $F^+$ . Then check the BCNF condition for the restriction set.
- Recap: the BCNF check is do the for loop for every functional dependency $\alpha\rightarrow\beta$ in the functional dependency set that:
  - $\alpha\rightarrow\beta$ is trivial
  - $\alpha$ is a superkey for $R$

Or you can do this: use the original set of dependencies F that hold on R, but with the
following test:
- For every attribute combination $\alpha\sub R_i$ , check that $\alpha^+$ (the attribute closure of $\alpha$ ) either includes no attribute of $R_i - \alpha$ , or includes all attributes of $R_i$
- If $\alpha^+$ includes none of the attributes of $R_i - \alpha$ that means $\alpha$ is not a (non-trivial) determinant within Ri
- If $\alpha+$ includes all of the attributes of Ri, then $\alpha$ is a superkey of R

Then we come back to the BCNF Decomposition Algorithm, after we detect one functional dependency $\alpha\rightarrow\beta$ is violating the BCNF in the $R_i$ , we do the decomposition of $R_i$ into $R_i-\beta$ and $\{\alpha, \beta\}$

Did you notice that the algorithm doesn’t check the Dependency Preservation, it doesn’t merge all the restriction $F_i$ of the $R_i$ to see if the $(F_1\cup F_2\cup ...\cup F_n)^+ = F^+$ holds.

BCNF or 3NF

There are some situations where

BCNF is not dependency preserving, and

efficient checking for FD violation on updates is important

We can use the Third Normal Form (3NF)

Allows some redundancy

But functional dependencies can be checked on individual relations without computing a join

There is always a lossless-join, dependency-preserving decomposition into 3NF

3NF Decomposition Algorithm

Before we introduce the 3NF Decomposition Algorithm, let’s first recap the 3NF definition:

A relation R is in 3NF if for all $\alpha\rightarrow\beta$ in $F^+$ , it is not a transitive functional dependency, which means at least one of the following holds:
- $\alpha\rightarrow\beta$ is trivial because a trivial dependency is not a transitive functional dependency $(\beta\sub\alpha)$ .
- $\alpha$ is a superkey for $R$ , because the determinant in transitive functional dependency is a non-primary attribute. If $\alpha$ is a superkey, then, it cannot be partial FD, and cannot be a non-primary attribute.
- Each attribute A Each attribute A in $\beta– \alpha$ is contained in a candidate key for R. For example, $\alpha$ and $\beta$ are all candidate key, but exclusive. (NOTE: the third condition does not say that a single candidate key must contain all the attributes in $\beta– \alpha$ ; each attribute A in $\beta-\alpha$ may be contained in a different candidate key)

We know BCNF is a sufficient set of 3NF, i.e. if a relational schema is a BCNF, it is a 3NF. Now let’s have a look at the 3NF decomposition.